Skip to content

Fix strip spec-mandated DV blob framing before deserializing#3576

Merged
Fokko merged 1 commit into
apache:mainfrom
ebyhr:ebi/puffin-offset
Jun 30, 2026
Merged

Fix strip spec-mandated DV blob framing before deserializing#3576
Fokko merged 1 commit into
apache:mainfrom
ebyhr:ebi/puffin-offset

Conversation

@ebyhr

@ebyhr ebyhr commented Jun 28, 2026

Copy link
Copy Markdown
Member

Rationale for this change

Fix _payload storing puffin[8:] instead of puffin — blob offsets
in the footer are file-relative (from byte 0), not relative to byte 8.
The two bugs cancelled for single-blob files at offset 4; the fix was
only testable once PuffinFile supported multi-blob/compressed files.

Also widen PuffinBlobMetadata.type from Literal["deletion-vector-v1"]
to str so PuffinFile can parse files containing non-DV blobs without
a Pydantic validation error.

Are these changes tested?

Yes. Copied files from https://github.com/apache/iceberg/tree/main/core/src/test/resources/org/apache/iceberg/puffin/v1
sample-metric-data-compressed-zstd.bin will be added separately in #3575

Are there any user-facing changes?

No

@ebyhr ebyhr force-pushed the ebi/puffin-offset branch from 98ddd63 to 7b1cee9 Compare June 28, 2026 11:20
Comment thread pyiceberg/table/puffin.py

class PuffinBlobMetadata(IcebergBaseModel):
type: Literal["deletion-vector-v1"] = Field()
type: str = Field()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason behind this change? I think it would be better to be strict in what we allow as a type. WDYT?

@ebyhr ebyhr Jun 30, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, we can't test sample-metric-data-uncompressed.bin - download from Iceberg Java repository. It doesn't contain deletion-vector-v1 type.

I understand the motivation behind the existing guard, but Puffin file itself doesn't restrict the types it can contain.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that's reasonable. Thanks 👍

@Fokko Fokko left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ebyhr for working on this, thanks 👍

@Fokko Fokko merged commit 41276a3 into apache:main Jun 30, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants